Nature Genetics — Latest Matching Preprints

1

Phenome-wide association of multiallelic copy number variation in 422,170 UK Biobank individuals reveals novel genetic loci associated with disease

Eisenberg, M.; Packer, R.; Shrine, N.; Demidov, G.; Pack, H.; Hollox, E. J.; Fawcett, K.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354825 medRxiv

Top 0.1%

39.9%

Show abstract

The contribution of multi-allelic CNVs (mCNVs) to disease risk has not been widely studied. This is largely because they have been difficult to characterise at a large-scale genome-wide, and are often not strongly associated with flanking SNVs, limiting imputation. Improved understanding of the role of mCNVs in disease risk could lead to novel insights into the pathobiology of disease. We robustly typed 69 mCNVs from UK Biobank whole exome sequences in discovery (n=150,682) and replication sets (n=269,317). Discovery and replication PheWAS used clinically-curated composite phenotypes by integrating self-report, primary and secondary health care data to interrogate these variants, for unrelated British individuals of African, European and Central/South Asian ancestries. 173 mCNV-phenotype associations were detected from 26 mCNVs, of which 114 associations replicated. One of eight potentially novel mCNV-phenotype signals was independent of neighbouring associated SNVs, the association of Sulfotransferase 1A1 and 1A2 genes (SULT1A1/SULT1A2) with estimated glomerular filtration rate (eGFR) in individuals of European ancestry (meta-analysed p=1.05x10-9, beta=0.016 [0.011; 0.021]). Other potentially novel associations include Golgi phosphoprotein 3 (GOLPH3) with the cardiovascular phenotype bundle branch block in individuals of South Asian ancestry (meta-analysed p=3.35x10-6, OR=2.13 [1.53, 2.96]) and alpha amylase 2B (AMY2B) with ventricular fibrillation and flutter in individuals of European ancestry (meta-analysed p=2.48x10-6, OR=1.50 [1.26; 1.78]). In summary, we show that accurate typing of biobank-scale sample sizes can identify associations between traits and mCNVs, acting through a gene dosage relationship. Our work provides several novel likely causative variants contributing to particular traits of clinical importance and immediately suggest a putative functional mechanism for the observed associations.

2

Resolving inflammatory bowel disease risk variants to genes and cell types

Fachal, L.; Zhang, R.; Gettler, K.; Haritunians, T.; Cleynen, I.; Stevens, C. R.; Zhang, Q.; Tastad, C.; Medici, C.; Do, R.; IIBDGC GWAS Group, ; Abreu, M. T.; Achkarj, J.-P.; Ahmad, T.; Bel Kok, K.; Bernstein, C.; Brooks, J.; Bujanda, L.; Butterworth, J.; Clark, K.; Cummings, F.; D'Amato, M.; Del Buono, J.; Duerr, R. H.; Ellinghaus, D.; Foley, S.; Franchimont, D.; Franke, A.; Hancock, L.; Hart, A.; Hooper, P.; Irving, P.; Jarvis, M.; Johnston, E.; Julia, A.; Kemp, C.; Kennedy, N.; Kupcinskas, J.; Latiano, A.; Lewis, J.; Li, A.; Limdi, J.; Louis, E.; McLaughlin, J.; Moayyedi, P.; Moran, G.; M

2026-05-18 genetic and genomic medicine 10.64898/2026.05.13.26352926 medRxiv

Top 0.1%

37.9%

Show abstract

Inflammatory bowel diseases (IBD), principally Crohn's disease (CD) and ulcerative colitis (UC), are common chronic disorders involving inflammation and often progressive tissue damage. Genome-wide association studies have mapped many risk signals, but the causal variants, effector genes and relevant cellular contexts remain difficult to resolve, limiting mechanistic interpretation and therapeutic translation. Here we performed a multi-ancestry GWAS meta-analysis of 125,992 individuals with IBD and more than 1.2 million controls, identifying 619 independent association signals (374 novel) at 420 IBD regions that account for 77-80% of SNP-based heritability. Fine-mapping resolved 81 high-confidence variants, 41 not previously reported. Although most signals were shared between CD and UC, 39% showed subtype specificity, with UC signals showing stronger enrichment in functional annotations from intestinal epithelial, secretory and enteroendocrine cells, and CD showing stronger genetic correlations with circulating inflammatory biomarkers, including C-reactive protein and glycoprotein acetylation. Latent causal modelling supported a causal effect of decreased high-density lipoprotein on CD risk. By integrating bulk and single-cell eQTL and pQTL resources using colocalisation and Mendelian randomisation, together with coding-variant evidence from exome sequencing, we prioritised 664 candidate effector genes across 341 signals, including 390 newly implicated IBD genes, revealing new biological mechanisms and candidate therapeutic targets supported by human genetics.

3

Genetic determinants of cytokine production in activated human monocytes

Gilchrist, J. J.; Mentzer, A. J.; Jostins, L.; Makino, S.; Naranbhai, V.; Danielli, S.; Nassiri, I.; Knight, J. C.; Fairfax, B. P.

2026-05-13 genetic and genomic medicine 10.64898/2026.05.08.26352736 medRxiv

Top 0.1%

33.0%

Show abstract

Monocyte function plays a central role in human health and mapping the genetic determinants of monocyte gene expression has provided insights into numerous disease processes. The relationship between genetic variation and functional cytokine secretion in response to immune stimuli remains poorly characterised however. To address this, we have quantified the production of 28 cytokines by monocytes from 366 healthy, European-ancestry donors following activation with lipopolysaccharide (LPS) and interferon gamma (IFN{gamma}). By integrating these data with genomic and transcriptomic data from the same cells we robustly define the regulatory determinants of monocyte cytokine secretion. We identify four genome-wide significant loci affecting monocyte cytokine release, observing both cis and trans regulatory effects on cytokine release. These loci include multi-cytokine trans regulatory activity of the CCR5-{Delta}32 deletion on secretion of the CCR5-binding cytokines, MIP-1{beta} and RANTES, and a cis regulator of PDGF-BB secretion, which colocalises with GWAS risk loci for ulcerative colitis and primary biliary cirrhosis. We further map the genetics of co-expression to establish relationships between RNA transcription and cytokine protein secretion. In doing so we identify marked enrichment of genes related to lipid metabolism in gene regulatory networks linked to cytokine secretion and identify that the COVID-19 risk locus at OAS1 uncouples OAS1 RNA expression from the secretion of 10 cytokines in response to LPS stimulation.

4

A genome-wide deletion map in 125,730 individuals for novel rare disease gene and variant discovery

McGuigan, A.; Pagnamenta, A. T.; Covill, L. E.; Sampson, J.; Camps, C.; Chen, Y.; Moitra, T.; Chundru, V. K.; O'Heir, E.; Allan, K.; Arno, G.; Broomfield, A.; Delatycki, M.; Lin, S.; Michaelides, M.; Rius, R.; Roscioli, T.; Simons, C.; Webster, A.; White, S. M.; Wilson, L.; Sanders, S. J.; O'Donnell-Luria, A.; Ellingford, J. M.; Taylor, J. C.; Whiffin, N.

2026-05-15 genetic and genomic medicine 10.64898/2026.05.13.26352722 medRxiv

Top 0.1%

32.1%

Show abstract

Structural variants (SVs) can disrupt gene function and contribute to pathogenesis of rare disorders. Here, we created a genome-wide knockout dataset across 125,730 individuals with genome sequencing data in the UK's National Genomic Research Library by leveraging the distinct read-depth signal associated with homozygous deletions. We curated 535,699 rare high-confidence homozygous deletion SVs, of which 48,735 were rare. These deletions collectively covered 213Mb or 6.92% of the human genome (4.58% of autosomal sequence), revealing substantial tolerance to complete sequence loss. From a subset of 58,022 individuals with rare disease, we identified 295 individuals with likely diagnostic homozygous deletions impacting protein-coding regions of known disease genes. A further 32 individuals had candidate non-coding SVs in or near to known disease genes, 19/32 (59.37%) of which disrupted 5-UTR/promoter regions, revealing promoter deletion as an underappreciated cause of rare disorders. Finally, we identify 43 genes with no known rare-disease association but with exonic homozygous deletions in two or more individuals with consistent phenotypes. We describe in detail PDC (phosducin) in Leber Congenital Amaurosis, GCG (glucagon) for a syndromic neurodevelopmental disorder with gastrointestinal involvement, and ENTPD3 for intellectual disability with autism, as candidate novel disease-associated genes. Overall, we create a genome-wide map of homozygous deletions and demonstrate the power of this dataset for rare disease diagnosis and novel disease-gene discovery.

5

Wavelet Decomposition-Based Genomic Analysis of the Human Electrocardiogram

Zainana, S.; Lauer, L. P.; Kiiskinen, T.; Tibshirani, R. j.; Hastie, T.; Ashley, E.; O'Sullivan, J. W.; Rivas, M. A.

2026-05-24 cardiovascular medicine 10.64898/2026.05.20.26353725 medRxiv

Top 0.1%

26.6%

Show abstract

The electrocardiogram (ECG) encodes the electrical activity of the heart across multiple timescales, yet standard clinical analysis collapses this rich signal into a handful of scalar measurements that discard most of the waveform's structure. Whether the frequency signals lost in this reduction carry heritable biological information relevant to cardiovascular disease risk remains unclear. Here we decompose resting 12-lead ECGs from 47,052 White British UK Biobank participants into 84 frequency-specific energy features using Daubechies-6 wavelet analysis across 12 leads and 7 decomposition levels, and perform independent genome-wide association analyses on each feature. We identify 67 independent loci and refine these to 101 high-confidence causal variants (posterior inclusion probability > 0.80) through Bayesian fine-mapping; associated loci converge on genes governing cardiac conduction and myocardial integrity, including SCN5A, TTN, KCNQ1, and DSP, alongside less-characterized cardiomyopathy candidates. SNP-based heritability estimates range from 0.03 to 0.26, with the strongest signals in mid-frequency bands (D6-D4, ~4-32 Hz) of Lead I and aVR, and strong inter-lead genetic correlations indicate a coordinated genetic architecture underlying the waveform. Integrating these features with FinnGen R12 cardiovascular phenotypes reveals genetic correlations reaching 0.56 with heart failure, driven predominantly by energy in the highest-frequency band (D1, 125-250 Hz), a spectral range routinely filtered from clinical ECGs and previously regarded as acquisition noise. These results reframe the electrocardiogram as a multi-frequency genetic phenotype, expand the set of cardiac loci discoverable from ECG data, and implicate high-frequency cardiac electrical activity as an underexplored dimension of cardiovascular disease risk.

6

A pan-cancer regulatory atlas of 6,983 GWAS variants prioritizes recurrent regulatory annotations and candidate programs at cancer risk loci

Dutta, S.

2026-05-20 genetic and genomic medicine 10.64898/2026.05.16.26353369 medRxiv

Top 0.1%

26.1%

Show abstract

Genome-wide association studies have identified thousands of cancer risk variants in non-coding regions, yet their regulatory mechanisms remain largely uncharacterized. Here we present a regulatory annotation atlas of 6,983 genome-wide significant variants across 23 cancer types, scored using multimodal AlphaGenome predictions and integrated with ENCODE-4, Roadmap Epigenomics, and JASPAR 2024 annotations. Most variants (70.5%) fall outside annotated cis-regulatory elements; 27.7% overlap enhancers and 1.4% overlap promoters. Comparison with 6,626 position-matched eQTL control variants suggests that enhancer-classified variants carry 1.86-fold higher predicted effects (P = 1e-94) and promoter variants 7.84-fold (P = 2.5e-19). A composite prioritization score (RegVar-basic, excluding GWAS-derived pleiotropy and TF disruption, AUC = 0.650; RegVar-full, AUC = 0.675) outperforms CADD (0.499) and LINSIGHT (0.558) in this cancer-gene discrimination benchmark. Within-locus ranking across 2,626 GTEx DAP-G eQTL credible sets shows that RegVar identifies the highest-posterior-probability variant in 47.3% of loci (P = 7.0e-13), while CADD performs at chance. Predicted target genes show 67.7% concordance with GTEx eQTL assignments. Permutation-controlled motif analysis highlights NFKB1, STAT1, IRF1, and ARNT as exploratory permutation-enriched candidate transcription factors at cancer risk loci. This atlas provides a resource for interpreting non-coding cancer susceptibility variants. Because AlphaGenome uses expression-related training data, GTEx-based validations should be interpreted as partially orthogonal rather than fully independent.

7

The Biobank Rare Variant consortium powers the discovery of rare genetic associations through global collaboration

Palmer, D. S.; Hill, B.; Hodgson, S.; Joeloo, M.; Kalantzis, G.; Kousathanas, A.; Koyama, S.; Lu, W.; Namba, S.; Rodriguez, Z. B.; Shortt, J. A.; Sonehara, K.; Vartanian, N.; Vy, H. M. T.; Wade, I. A.; White, S. L.; Baya, N. A.; Chami, N.; Do, R.; Estrada, K.; Finer, S.; Genovese, G.; Guez, J.; Itan, Y.; Kanai, M.; Lassen, F. H.; Matsuda, K.; Moutsianas, L.; Peloso, G. M.; Priit, P.; Rader, D. J.; Rendon, A.; Rocheleau, G.; Sadeghi-Alavijeh, O.; Selvaraj, M. S.; Smit, R. A.; Wang, D.; Wigdor, E. M.; Yu, Z.; Colorado Center for Personalized Medicine, ; Estonian Biobank Research Team, ; Genes

2026-05-24 genetic and genomic medicine 10.64898/2026.05.21.26353759 medRxiv

Top 0.1%

25.4%

Show abstract

Rare coding variants can have large effects on disease risk and provide direct routes from human genetics to disease mechanisms and therapeutic targets, but their discovery is constrained by sample size, particularly for low-prevalence diseases. Here we establish the Biobank Rare Variant Analysis (BRaVa) consortium, a global rare variant association resource that integrates sequencing and linked health-record data from ten biobanks and cohorts comprising over 1.2 million individuals across diverse ancestries. We performed gene-based meta-analyses of rare coding variation across 33 clinical endpoints and 11 quantitative traits. Aggregating evidence across biobanks and ancestries identified 514 gene-trait associations, including 31 not previously reported in prior studies or curated association resources following systematic literature review. Notably, 36.1% of gene-level associations were undetectable in any individual biobank, and 91 emerged only through cross-ancestry meta-analysis, demonstrating that federated integration enables discovery beyond the reach of single cohorts. Similar gains were observed at the variant level, where 25.0% of phenotype-locus associations were detectable only through meta-analysis. Effect size estimates were correlated across ancestries with concordant directions of effect, supporting the generalizability of rare variant associations. The identified signals implicate pathways involved in transcriptional and epigenetic regulation, metabolism, vascular and epithelial biology, and immune function, highlighting rare coding variation as an engine for biological discovery across medical record phenotypes. For example, damaging variation in ANKRD12 implicates inflammatory transcriptional dysregulation in asthma and chronic obstructive pulmonary disease, and ultra-rare predicted loss-of-function variants in NAA15 link protein acetylation processes to type 2 diabetes risk. BRaVa establishes a scalable framework and freely available community resource for rare variant meta-analysis across global biobanks. Public release of gene- and variant-level association summary statistics provides a reference map of rare coding variant associations to support disease gene discovery, biological interpretation, and therapeutic target prioritization as sequencing-linked health-record resources continue to expand.

8

Decoding causal genes and programs from regulatory variants in aortic valve disease

Briend, M.; Rufiange, A.; Duclos, V.; Mathieu, S.; Kanmacher, T.; Boudreau, D. K.; Gaudreault, N.; Saavedra-Armero, V.; Dagenais, F.; Couture, C.; Joubert, P.; Theriault, S.; Bosse, Y.; Mathieu, P.

2026-05-29 genetics 10.64898/2026.05.26.727981 medRxiv

Top 0.1%

25.2%

Show abstract

Aortic valve disease is common, yet its regulatory mechanisms remain poorly understood. We performed multi-omic profiling of human aortic valve interstitial cells (HAVICs), identifying 11,891 allele-specific chromatin accessibility QTLs (as-caQTLs), 48% novel to this cell type. These variants were enriched in active enhancers, disrupted transcription factor (TF) motifs, particularly AP-1, TEAD and GATA families, and were validated by allele-specific TF binding assays. A fine-tuned deep DNA sequence model prioritized common and rare variants at risk loci predicted to impact chromatin accessibility. Single-cell CRISPRi perturbation of 247 variants identified cis-target genes at 55 as-caQTL elements, including loci without eQTLs. We demonstrate that common regulatory variants controlling elastin and fibrillin impact the development of the aortic valve apparatus. We provide genetic evidence and a mechanistic framework for the contribution of a reduced aortic root size to CAVD risk. Perturbations identified core cell programs led by upstream regulators AHNAK, PDIA6, and RNFT1 converging on extracellular matrix production and iron transport.

9

Shared host-genetic architecture between gut microbiota and internalizing psychopathology

Velez-Pardo, P.; Solano, R. J.; Quinchia-Figueroa, A. M.; Montoya Monsalve, R.; Moratto-Vasquez, N. S.

2026-06-02 psychiatry and clinical psychology 10.64898/2026.05.31.26354553 medRxiv

Top 0.1%

25.0%

Show abstract

Whether gut microbial composition is causally linked to mental illness, or merely correlated with it, remains unresolved. Using genetic variants as natural instruments (Mendelian randomization, MR), we tested the genetically predicted effects of 211 gut microbial taxa on nine psychiatric and psychopathology-related phenotypes, using the largest available genome-wide association studies. Across 1,898 valid tests, seven taxon-outcome associations passed false-discovery-rate correction (FDR < 0.05), and all fell on the internalizing spectrum (depression, neuroticism and insomnia) rather than on bipolar disorder, schizophrenia or ADHD; they included a protective association of the Mollicutes/Tenericutes clade with depression (beta = -0.073, p = 1.5e-6) and of Butyrivibrio with neuroticism, and a deleterious association of Betaproteobacteria with neuroticism. Conservative tests tempered any per-locus causal reading: Bayesian colocalization gave PP.H4 < 0.05 at every locus and CAUSE found 0 of 45 pairs genuinely causal; yet the direction of effect matched the protective-versus-deleterious hypothesis in 33 of 45 pairs (binomial p = 1.2e-3). Modelling the shared genetics of the nine phenotypes placed these taxa specifically on a latent internalizing factor (correlation 0.48 with a separate psychotic factor) and, in a bifactor model, on internalizing-specific genetic variance beyond a general psychopathology factor. Selected gut microbial taxa and internalizing psychopathology therefore appear to share host genetics rather than a direct microbe-to-disorder causal chain. We release the full analysis as an open resource for larger microbiome studies, brain-tissue follow-up and experimental tests of candidate mechanisms.

10

Exome sequencing directly implicates 68 genes in inflammatory bowel disease

Zhu, R.; Zhang, Q.; Yuan, K.; Zhang, R.; Turvey, A. K.; Stevens, C. R.; Fachal, L.; IIBDGC Sequencing Group, ; Ahmad, T.; Bel Kok, K.; Bernstein, C. N.; Bokemeyer, B.; Brant, S. R.; Brooks, J.; Butterworth, J.; Cho, J. H.; Clark, K.; Cummings, F.; Duerr, R. H.; Ennis, S.; Farkkila, M.; Faubion, W. A.; Foley, S.; Franchimont, D.; Franke, A.; Hancock, L.; Hart, A.; Hooper, P.; Irving, P.; Jarvis, M.; Johnston, E.; Karlson, E. W.; Kemp, C.; Kennedy, N.; Kupcinskas, J.; Lamb, C.; Lees, C.; Lewis, J.; Li, A.; Limdi, J.; Loescher, B.-S.; Louis, E.; McCauley, J. L.; McGovern, D.; McLaughlin, J.; Moa

2026-05-12 genetic and genomic medicine 10.64898/2026.05.08.26352648 medRxiv

Top 0.1%

22.9%

Show abstract

Inflammatory bowel disease (IBD) is a chronic immune-mediated disorder of the gastrointestinal tract whose genetic basis is only partly resolved because most risk variants identified by genome-wide association studies (GWAS) lie in non-coding regions, limiting direct gene assignment and biological interpretation1,2. Here we analysed whole-exome and whole-genome sequencing data from 86,213 cases and 478,363 controls to define the contribution of protein-altering variation to IBD susceptibility. We identify 68 genes directly implicated by coding variation, including genes supported by single-variant associations and ultra-rare mutational burden. 57 of these genes lie within regions previously highlighted by GWAS, indicating convergence of regulatory and protein-altering evidence in IBD. The implicated genes point to coherent biological themes, including post-transcriptional control of inflammatory programmes, epithelial restitution, and calibrated immune pathway signalling, and nominate targets with therapeutic relevance. These results show that large-scale sequencing can resolve disease genes and pathways that remain ambiguous from non-coding association alone, providing a more direct route from human genetics to biological insight and therapeutic hypotheses.

11

CMAPS: Causal Mediation Analysis of Perturbation Screens with Application to Genome-scale Perturb-seq Data

Duan, J.; Kang, H.; Keles, S.

2026-05-23 genomics 10.64898/2026.05.21.726924 medRxiv

Top 0.1%

22.3%

Show abstract

CRISPR-Cas9 perturbation screens coupled with single-cell multi-omic profiling enable dissection of gene regulatory mechanisms, yet existing analyses largely quantify total perturbation effects and offer limited insight into the molecular intermediates that transmit these effects. We introduce CMAPS (Causal Mediation Analysis for Perturbation Screens), a semiparametric framework for robust mediation analysis that accommodates unmeasured mediator-outcome confounding and incorporates an adaptive bootstrap test with false discovery rate control. Simulations and data-driven computational experiments show that CMAPS yields accurate, calibrated mediation estimates and robust mediator identification, as confirmed through negative controls and permutation-based validation. Applied to K562 Perturb-seq, CMAPS recapitulates transcriptional cascades downstream of GATA1. In BT16 MultiPerturb-seq data, CMAPS identifies promoter-centric, enhancer-distributed, and mixed cis-regulatory programs linking chromatin remodeling factors to transcriptional responses. CMAPS provides a rigorous and interpretable framework for mechanistic inference in single-cell perturbation screens. CMAPS is implemented in R and is available at https://github.com/keleslab/CMAPS.

12

Systematic common and rare variant association testing in 392,030 whole genomes in All of Us

Lu, W.; Carroll, R. J.; Solomonson, M.; Guez, J.; He, M. K.; Marten, D. J.; Martinez-Carrosco, A.; Wang, Y.; Dowd, C. S.; Kanai, M.; Gorissen, B. L.; Kouame, A. J. S.; Brogan, J.; Waxse, B. J.; Samarakoon, R.; Cook, J. A.; Qian, J.; Zhou, Y.; Choi, K. W.; Basford, M.; Lyons, M.; Linder, J. E.; Stewart, S.; Gupta, N.; Schultz, P.; Goldstein, D.; Llanwarne, C.; Goldstein, J. I.; Higham, E. G. C.; King, D. C.; Palmer, D. S.; Elenbaas, J. S.; Rohlicek, G. K.; He, Q.; Goodrich, J. K.; The All of Us Research ProgramGenomics Investigators, ; Smoller, J. W.; Lichtenstein, L.; Gabriel, S. B.; Martin,

2026-05-12 genetic and genomic medicine 10.64898/2026.05.08.26350964 medRxiv

Top 0.1%

22.2%

Show abstract

Large-scale genome-wide association studies (GWAS) and rare variant association studies (RVAS) from population biobanks provide valuable resources for gene discovery in complex human traits. We present an analysis of the All of Us Research Program v8 release, which includes whole genome sequencing data and harmonized phenotypic information of 392,030 participants after quality control, enabling a unified investigation of rare and common variants across a spectrum of human traits and diseases. We build an extensive phenome- and genome-wide ("All by All") computational framework to perform GWAS and RVAS on 3,602 phenotypes and identify 49,863 approximately independent, high-quality single-variant and gene-level associations. Meta-analyses of All of Us and UK Biobank, with sample sizes as large as 786,871 participants, further enhance statistical power and find 193 pLoF gene-phenotype associations that are not significant in either cohort alone, including 22 associations not highlighted by previous studies. We also present a public interactive browser that integrates association results for common and rare variants to facilitate interpretation and rapid querying of summary statistics, along with supporting documentation, and a Featured Workspace in the All of Us Researcher Workbench. Our framework will apply to iterative data releases as All of Us grows, empowering researchers worldwide to uncover insights into the functional effects of genetic components on complex traits and diseases.

13

Identification of Genomic Loci Associated with Cellular Rhythms in Diversity Outbred Mice

Fu, C.; Kim, S.-M.; Philip, V.; Gagnon, L.; McClung, C. A.; Chesler, E.; Logan, R. W.

2026-05-29 genetics 10.64898/2026.05.28.728467 medRxiv

Top 0.2%

21.7%

Show abstract

Inter-individual variation in human molecular and behavioral circadian rhythms motivates genetic dissection in model systems with human-like diversity. We quantified cellular clock phenotypes from primary skin fibroblasts of several hundred Diversity Outbred (DO) mice-each carrying a unique mosaic of eight founder genomes-by longitudinal bioluminescence recordings of a Bmal1-luciferase reporter (LumiCycle). Canonical rhythm parameters (period, phase, amplitude, damping) were extracted and exhibited broad variability (heritability {approx}13-35%), exceeding the ranges of founder strains. We performed genome-wide QTL mapping with R/qtl2 (linear mixed models with sex and experimental group covariates, kinship control, permutation-based significance, 1.5-LOD support intervals). A suggestive QTL for amplitude localized to chromosome 12 (LOD 6.9; [~]7.5-12.0 Mb), with founder effects indicating higher amplitude for C57BL/6J and lower for PWK/PhJ. Among 21 protein-coding genes in this interval, Apob (apolipoprotein B), a clock-regulated determinant of lipoprotein assembly, emerged as a strong candidate for amplitude control. A phase QTL mapped to chromosome 1 (support interval [~]1.36 Mb) with divergent founder effects (C57BL/6J, NOD/ShiLtJ, WSB/EiJ: delayed; PWK/PhJ, CAST/EiJ: advanced) and prioritized candidates including Epha4 (an Eph receptor tyrosine kinase implicated in photic entrainment) and Acsl3. Integrative analysis in GeneWeaver connected QTL gene sets to prior loci for voluntary alcohol consumption and circadian period on proximal chromosome 12, and highlighted overlaps with GWAS signals for adolescent idiopathic scoliosis and schizophrenia, suggesting shared pathways between circadian regulation, metabolism, and neurobehavioral traits. Together, these findings define reproducible genomic loci for cellular clock phenotypes in a highly recombinant population, nominate tractable candidate genes (Apob, Epha4/Acsl3) for mechanistic follow-up, and illustrate how high-diversity mouse genetics bridges cellular circadian variation with complex disease biology.

14

Relational biological structure improves fine-mapping of causal GWAS variants under weak signal

Estaji, E.; Zhao, S.-W.; Chen, Z.-Y.; Nie, S.; Mao, J.-F.

2026-05-16 genomics 10.64898/2026.05.15.725513 medRxiv

Top 0.2%

21.6%

Show abstract

Linkage disequilibrium (LD) makes causal GWAS variants indistinguishable from correlated neighbours; resolving them is the fine-mapping problem, and the challenge is species-specific: humans face dense ancestry-imbalanced LD, yeast and Arabidopsis exceptionally long LD, and crop germplasm sparse and fragmented annotations that defeat human-biobank curation pipelines. Bayesian fine-mappers integrate annotations as flat per-variant priors, discarding the relational structure linking variants to tissue-specific eQTLs, pathways and protein-protein interactions. Hierarchical belief propagation (HBP) on a variant- gene-pathway factor graph matches Bayesian baselines at 5-40x speed; an annotation-adaptive complement, graph-augmented fine-mapping (GAFM), wins 27-2 against SuSiE at weak signal and recovers LDLR, APOE, LPL, GCKR and ANGPTL3 at single-variant resolution across four Pan-UK Biobank ancestries. On the 3,000 Rice Genomes grain weight + shape panel, mixture-prior posterior reweightings of GAFM/HBP and their ensemble (GAFM-MX, HBP-MX, ENS) reach 47.6% top-1-PIP exact-position recovery of 21 panel-matched stable QTNs -- the highest of any method, exceeding SuSiE (28.6%) and SBayesRC (14.3%) --at 200-700x SuSiEs per-locus speed. Across 692 leads in four species, a non-uniform per-variant prior, not uniform high coverage, lets the graph break LD ties: adding a regulatory-element flag to an otherwise uniform human cache flips HBP narrower than GAFM from 0% to 88% on 321 Pan-UKB leads. These results recast multi-omics fine-mapping as a non-uniform-prior-curation problem rather than a uniform-coverage problem, and reframe post-GWAS analysis as message passing over biological structure rather than weighted regression on flattened annotations.

15

Systematic identification of disease-associated 3D neighborhoods in protein structures

Finucane, H. K.; Nason, E.; Gerges, S.; Satterstrom, F. K.; Gorissen, B.; Liao, R.; Panagiotaropoulou, G.; Guez, J.; The Autism Sequencing Consortium, ; Karczewski, K.; Daly, M. J.

2026-06-02 genetic and genomic medicine 10.64898/2026.05.29.26354366 medRxiv

Top 0.2%

19.3%

Show abstract

Rare variant association studies (RVAS) have identified hundreds of genes contributing to human disease, yet gene-level signals provide limited insight into the molecular mechanisms underlying pathogenicity. Missense variants, which can be mapped onto three-dimensional protein structures, offer an opportunity to gain novel mechanistic insights. Here, we develop a scalable framework for systematically mapping case and control variants onto protein structures and identifying spatially localized regions enriched for case variants. Our framework builds on the 3D Neighborhood Test (3DNT), which we recently introduced in a single-gene analysis of ATP2B2, and enables the genome-wide analysis of rare coding variation beyond standard gene-level approaches. We applied 3DNT across multiple large-scale datasets, including Mendelian disease variants from ClinVar, de novo mutations from 37,486 autism spectrum disorder (ASD) probands, and case-control exome sequencing cohorts for epilepsy and schizophrenia. We identified significant clusters in 872 genes for Mendelian disease, in 70 genes for autism, in one gene for epilepsy, and in three genes for schizophrenia. These clusters are strongly enriched for known functional sites and provide insight into both known and previously unrecognized disease genes. Our results demonstrate that scalably integrating RVAS data with protein structure predictions localizes disease-associated variation to specific functional regions and reveals a layer of disease biology that is largely invisible to standard analyses.

16

Multimodal atlas of human atherosclerosis links granular vascular cell states to coronary artery disease risk

Mosquera, J. V.; Tang, I.; Murach, M.; Auguste, G.; Kodali, A.; Hart, P.; Shaw, D. M.; Li, M.; Turner, A. W.; Hodonsky, C. J.; Dworak, N. M.; de Oliveira, A. K.; Sol-Church, K.; Jhee, T.; van der Sijs, K. I. M.; Adkar, S. S.; Choi, R. B.; Vacante, F.; Wu, J. C.; Cheng, P.; Giannarelli, C.; Leeper, N. J.; Finn, A. V.; Bjorkegren, J. L. M.; Kovacic, J. C.; Yurdagul, A.; van der Laan, S. W.; Miller, C. L.

2026-05-26 cardiovascular medicine 10.64898/2026.05.24.26353986 medRxiv

Top 0.2%

19.2%

Show abstract

Advances in single-cell and spatial assays have revolutionized the scale and resolution of molecular tissue profiling. Here we present MetaPlaq, a multimodal atlas of human atherosclerotic arterial beds comprising over a million cells across single-cell transcriptomics, epigenomics and high-resolution spatial expression assays. We map granular cell states and disease-relevant transcriptional programs within the native tissue context of coronary arteries. Furthermore, we map cardiovascular GWAS signals to smooth muscle cells (SMCs) and endothelial cells (ECs) and uncover the cis-regulatory architecture governing their phenotypic transitions. Our comprehensive epigenomic reference allowed us to build cell-specific enhancer-gene link maps and multimodal gene regulatory networks (GRNs) underlying disease-relevant states such as osteogenic SMCs and ECs undergoing mesenchymal transition. We also integrate SMC and EC disease-associated gene sets with GRNs to nominate key transcription factors such as PRRX1, BNC2 and ELK3 regulating atherosclerosis-relevant transcriptional programs. Finally, we layer single-cell and spatial modalities to fine-map GWAS variants with improved cell and anatomical context. We highlight candidate cell-specific regulatory mechanisms at less characterized CAD loci, including FGD5 and MCF2L in ECs. Together, this atlas represents an important step towards fully interpreting genetic risk loci and informing new therapeutic strategies for cardiovascular disease.

17

Calibrated Prediction Intervals for Polygenic Scores: Updated Comparisons, Contextual Calibration, and Data Normalization

Chang, X.; Hou, S.; Zhou, X.

2026-05-19 genetic and genomic medicine 10.64898/2026.05.15.26353336 medRxiv

Top 0.2%

19.1%

Show abstract

Calibrated prediction intervals for polygenic scores (PGS) are essential for communicating individual-level uncertainty in genomic medicine. We present updated comparisons of two methods for constructing such intervals: CalPred, a parametric approach, and PredInterval, a non-parametric approach. Our results show that both methods can achieve calibrated coverage, although CalPred additionally requires a sufficiently large calibration set. The two methods also exhibit complementary trade-offs with respect to dataset size and risk identification. We further show that contextual calibration, as introduced in Hou et al. and followed in Shi et al., is most naturally achieved through appropriate phenotype normalization and data preprocessing. Apparent miscalibration can arise from inadequate normalization or from providing contextual information to some methods but not others. In UK Biobank, standard GWAS phenotype normalization procedures are sufficient to achieve contextual calibration for traits analyzed. In the extreme simulations of Hou et al. and Shi et al., supplying contextual covariates to PredInterval restores contextual calibration without normalization, and appropriate normalization can achieve contextual calibration without supplying covariates, while also substantially improving upstream tasks including association power and PGS accuracy. Together, these results underscore the central role of phenotype normalization and data preprocessing in GWAS analyses, including reliable uncertainty quantification for PGS.

18

Reversion mutations define minimal BRCA1/2 requirements for therapy resistance

Magraner-Pardo, L.; Kerrison, W.; Krastev, D. B.; Alcraft, R.; Xiao, H.; Brough, R.; Song, F.; Choi, S.; Gulati, A.; Rodrigues, M.; Labidi-Galy, I.; Pujade-Lauraine, E.; Ray-Coquard, I.; Haider, S.; Pettitt, S. J.; Tutt, A. N.; Lord, C. J.

2026-05-25 cancer biology 10.64898/2026.05.22.727163 medRxiv

Top 0.2%

18.8%

Show abstract

Although platinum salts or PARP inhibitors are effective in delivering anti-tumor responses in people with BRCA1 or BRCA2 mutated cancers, drug resistance is common and is often caused by secondary BRCA1/2 reversion mutations that restore function. By collating and analyzing 848 BRCA1/2 reversion mutations in 384 cancer patients with drug resistance, we confirm that pathogenic BRCA1/2 mutation type influences the acquisition of reversions, and that large BRCA1/2 deletions are an underappreciated form of reversion. Integrating reversion data with systematic CRISPR-Cas9 screens that delete BRCA1/2 exons, we also show that both proteins contain privileged domains whose structure is essential for drug resistance, including the PALB2 interacting domains of both BRCA1 and BRCA2. Reversions in PALB2 also conserve both BRCA1 and BRCA2 binding domains. Surprisingly, exon 11 of BRCA2, which encodes BRC repeats 1-8, is not essential for resistance. Using this patient and functional information, we estimate the likelihood of pathogenic BRCA2 mutations to revert. We show that risk of reversion correlates with both the presence of clinical reversions and the response to treatment, suggesting that the propensity to revert could be a useful clinical parameter.

19

Distinct and shared genetics of kidney filtration function versus albuminuria revealed by multi-trait GWAS

de Hesselle, H. C.; Garben, B.-F.; Stark, K. J.; Warth, R.; Teumer, A.; Pattaro, C.; Heid, I. M.; Winkler, T. W.

2026-06-09 genetic and genomic medicine 10.64898/2026.06.08.26355141 medRxiv

Top 0.2%

18.7%

Show abstract

Chronic kidney disease is characterized by decreased glomerular filtration rate (eGFR, estimated from serum creatinine or cystatin C) or increased urinary albumin-to-creatinine-ratio (UACR). Genome-wide association studies provided the genetic make-up of these traits, but their overlap remained largely unknown. Our multi-trait GWAS (N=1M) identified 812 signals and multi-trait fine-mapping sharpened the identification of likely causal variants. Of 333 signals classified for filtration function or albuminuria, only 11 overlapped. Their effects on eGFR and UACR were directionally concordant, dominated by eGFR and independent of HbA1c or mean arterial pressure. Mapped genes pinpointed mechanisms related to glomerular filtration area (SHROOM3, EPB41L5) and sodium-mediated intraglomerular pressure (NRBP1, DPEP1/CHMP1A). Genetics of fluid intake resulted in shadow effects on UACR without albumin leakage into urine. Our multi-trait approach sharpened the identification of likely causal genes for kidney traits, demonstrated largely distinct genetics for filtration function versus albuminuria, and provided new biological insights into the overlap.

20

AFQuery: a bitmap-indexed, capture-aware allele frequency engine for clinical genomics cohorts

Santos-Diaz, G.; Toro-Barrios, N.; Carmona, R.; Uria-Regojo, G.; Jimenez-Arias, R.; Gurriaran, X.; Ramilo, P.; Amigo, J.; Minguez, P.; Dopazo, J.; Lopez-Lopez, D.

2026-05-22 genetic and genomic medicine 10.64898/2026.05.15.26353174 medRxiv

Top 0.2%

18.6%

Show abstract

Motivation: Allele frequency (AF) is central to clinical variant classification under ACMG/AMP guidelines. Public reference databases offer broad ancestry coverage, but local ancestries, rare-disease enrichment, and institutional case distributions are often underrepresented, so cohort-derived AF is a valuable complement. Computing accurate AF from institutional cohorts is nonetheless error-prone: even successive versions of the same capture kit cover substantially different target regions, and naive methods inflate the allele number (AN) at positions not shared by all kits, deflating AF and biasing ACMG frequency evidence toward pathogenic categories. Results: We present AFQuery, a bitmap-indexed AF engine that computes capture-aware, ploidy-aware allele frequencies from pre-indexed Roaring Bitmaps in {approx}14 ms per point query ({approx}34 ms for 1-Mbp region queries), independently of cohort size up to 50,000 samples. In simulated mixed-technology cohorts, capture-aware AN reduced AF mean absolute error 8-13-fold and removed the systematic bias toward pathogenic ACMG categories, yielding 10-45-fold fewer spurious pathogenic-evidence calls. Availability: AFQuery is freely available under the MIT licence at https://github.com/babelomics/afquery.